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Abstract 

It is shown that all the Frequentist methods are equivalent from a statistical 
point of view, but the physical significance of the confidence intervals depends 
O on the method. The Bayesian Ordering method is presented and confronted 

with the Unified Approach in the case of a Poisson process with background. 
Some criticisms to both methods are answered. It is also argued that a general 
^ . Frequentist method is not needed. 

lO \ 1. Introduction 

In this report I will be concerned mainly with the Frequentist (classical) theory of statistical inference, 
but I think that it is interesting and useful that I express my opinion on the war between Frequentists and 
^ ; Bayesians. To the question 

"Are you Frequentist or Bayesian"? 

o 
o 
o 

*S 

I think that if one likes statistics, one can appreciate the beauty of both Frequentist and Bayesian theories 
and the subtleties involved in their formulation and application. I think that both approaches are valid 
from a statistical as well as physical point of view. Their difference arises from different definitions 
of probability and their results answer different statistical questions. One can like more one of the two 
theories, but I think that it is unreasonable to claim that only one of them is correct, as some partisans 
of that theory claim. These partisans often produce examples in which the other approach is shown to 
yield misleading or paradoxical results. I think that each theory should be appreciated and used in its 
limited range of validity, in order to answer the appropriate questions. Finding some example in which 
one approach fails does not disprove its correctness in many other cases that lie in its range of validity. 

My impression is that the Bayesian theory (see, for example, [jlj]) has a wider range of validity 
because it can be applied to cases in which the experiment can be done only once or a few times (for 
example, our thoughts in everyday decisions and judgments seem to follow an approximate Bayesian 
method). In these cases the Bayesian definition of probability as degree of believe seems to me the only 
one that makes sense and is able to provide meaningful results. 

Let me remind that since Galileo an accepted basis of scientific research is the repeatability of 
experiments. This assumption justifies the Frequentist definition of probability as ratio of the number of 
positive cases and total number of trials in a large ensemble. The concept of coverage follows imme- 
diately: a 100a% confidence interval for a physical quantity p is an interval that contains (covers) the 
unknown true value of that quantity with a Frequentist probability a. In other words, a 100a% confi- 
dence interval for p belongs to a set of confidence intervals that can be obtained with a large ensemble 
of experiments, 100a% of which contain the true value of p. 
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2. The statistical and physical significance of confidence intervals 

I think that in order to fully appreciate the meaning and usefulness of Frequentist confidence intervals 
obtained with Neyman's method [§,|3|], it is important to understand that the experiments in the ensemble 
do not need to be identical, as often stated, or even similar, but can be real, different experiments [||, |J]. 
One can understand this property in a simple way ^ by considering, for example, two different exper- 
iments that measure the same physical quantity fi. The 100a% classical confidence interval obtained 
from the results of each experiment belongs by construction to a set of confidence intervals which can be 
obtained with an ensemble of identical experiments and contain the true value of /i with probability a. 
It is clear that the sum of these two sets of confidence intervals, containing the two confidence intervals 
obtained in the two different experiments, is still a set of confidence intervals that contain the true value 
of p with probability a. 

Moreover, for the same reasons it is clear that the results of different experiments can also be 
analyzed with different Frequentist methods i.e. methods with correct coverage but different pre- 
scriptions for the construction of the confidence belt. This for me is amazing and beautiful: whatever 
method you choose you get a result that can be compared meaningfully with the results obtained by 
different experiments using different methodsl It is important to realize, however, that the choice of the 
Frequentist method must be done independently of the knowledge of the data (before looking at the data), 
otherwise the property of coverage is lost, as in the "flip-flop" example in Ref. [Q]. 

This property allow us to solve an apparent paradox that follows from the recent proliferation of 
proposed Frequentist methods [0, |j, ^, 10, lTJ, 12]. This proliferation seems to introduce a large degree of 



subjectivity in the Frequentist approach, supposed to be objective, due to the need to choose one specific 
prescription for the construction of the confidence belt, among several available with similar properties. 
From the property above, we see that whatever Frequentist method one chooses, if implemented correctly, 
the resulting confidence interval can be compared statistically with the confidence intervals of other 
experiments obtained with other Frequentist methods. Therefore, the subjective choice of a specific 
Frequentist method does not have any effect from a statistical point ofviewl 

Then you should ask me: 

Why are you proposing a specific Frequentist method? 

The answer lies in physics, not statistics. It is well known that the statistical analysis of the same data 
with different Frequentist methods produce different confidence intervals. This difference is sometimes 



crucial for the physical interpretation of the result of the experiment (see, for example, [g, [10p). Hence, 
the physical significance of the confidence intervals obtained with different Frequentist methods is some- 
times crucially different. In other words, the Frequentist method suffers from a degree of subjectivity from 
a physical, not statistical, point of view. 

3. The beauty of the Unified Approach and its pitfalls 

The possibility to apply successfully Frequentist statistics to problematic cases in frontier research has 
received a fundamental contribution with the proposal of the Unified Approach by Feldman and Cousins 
[^]. The Unified Approach consists in a clever prescription for the construction of "a classical confidence 
belt which unifies the treatment of upper confidence limits for null results and two-sided confidence 
intervals for non-null results". 

In the following I will consider the case of a Poisson process with signal p and known background 
b. The probability to observe n events is 

P (n M = !£±Vj. . (!) 

n\ 




The Unified Approach is based on the construction of the acceptance intervals [ni (fx) , (fx)] 
ordering the n's through their rank given by the relative magnitude of the likelihood ratio 

*<"< "> = ptt^ = ( - t± ^) " • < 2 > 

where ^best is the maximum likelihood estimate of n, 

/x b cst("<, = Max[0, n - b] . (3) 

As a result of this construction the confidence intervals are two-sided (i.e. [/xi ow , /x up ] with /ui ow > 0) for 
n>b, whereas for n < b they are upper limits (i.e. fi\ ow = 0). 

The fact that the confidence intervals are two-sided for n > b can be understood by considering 
n > b, that gives /ibcst = n — 6. In this case the likelihood ratio (|2|) is given by 

R(n > b, /x, b) = 0^^) e n ~^ +fe ) =exp{n[l + ln(/i + 6) -Inn] - + ™0. (4) 

This implies that the rank of high values of n is very low and they are excluded form the confidence 
belt. Therefore, the acceptance intervals n2(//)] are always bounded, i.e. ^(/u) is finite, and 

the confidence intervals are two-sided for n > b, as illustrated in Fig. |], where the solid lines show the 
borders of the confidence belt for a background b = 5 and a confidence level a = 0.90. 

The fact that the confidence intervals are upper limits for n < b can be understood by considering 
n < b, for which we have ^b es t = and the likelihood ratio that determines the ordering of the n's in 
the acceptance intervals is given by 

R(n<b,n,b) = + e ~"- (5) 

Considering now the acceptance interval for u = 0, we have R(n < b, fj, = 0, b) = 1. Therefore, all 
n < b for u = have highest rank and are guaranteed to lie in the confidence belt. This is illustrated 
in Fig. [I], where the thick solid segment shows the n < b part of the acceptance interval for u = 0, that 
must lie in the confidence belt. Since u is a continuous parameter, also for small values of /u the n < b 




Fig. 3: 90% CL upper limit fi up as a function of the background b for n = (lower lines), . . ., n = 5 (upper lines). The solid 
part of each line shows where b > n. 



have rank close to the highest one and lie in the confidence belt. Indeed, for fi > 0, the likelihood ratio 
(^) increases for n going form zero to the largest integer smaller or equal to b and decreases for larger 
values of n. Hence, the largest integer nh r such that < b has highest rank. If fi is sufficiently small 
all n < b have rank close to maximum and are included in the confidence belt if the confidence level is 
large enough, a > 0.60. For example, R(n = 0, fx, b) > i?(nh r + 1, /J., b) for fi < (1 + &)e -1 /( 1+b ) — b. 
Therefore, the left edge of the confidence belt must change its slope for n < b and intercept the ^-axis at 
a positive value of /i, as illustrated in Fig. [|. The value of /x at which the left edge of the confidence belt 
intercepts the ^-axis, that corresponds to fi up (n = 0), depends on the value of the background b and on 
the value of the confidence level a. 

However]], for small values of a the Unified Approach gives zero-width confidence intervals for 
n <C b, as illustrated in Fig. |2[ where I have chosen 6 = 5 and a = 0.50. One can see that the segment 
n < b is enclosed in the confidence belt for fi = 0, but for any value of /i > the sum of the probabilities 
of the n's close to /j, + b is enough to reach the confidence level and low values of n are not included in 
the confidence belt. Hence, in this case the Unified Approach gives zero-width confidence intervals for 
n < 2. 

The unification of the treatments of upper confidence limits for null results and two-sided con- 
fidence intervals for non-null results obtained with the Unified Approach is wonderful, but it has been 
noticed that the upper limits obtained with the Unified Approach for n < b are too stringent (meaning- 
less) from a physical point of view [||, [13|]. In other words, although these limits are statistically correct 
from a Frequentist point of view, they cannot be taken as reliable upper bounds to be used in physical 
applications. 

This problem is illustrated in Fig. ||A, where I plotted the 90% CL upper limit fi np as a function 
of b for n = 0, . . . , 5. The solid part of each line shows where b > n. One can see that for a given n, 
/i up decreases rather steeply when b is increased, until a minimum value close to one is reached. The 
curves have jumps because n is an integer and generally the desired confidence level cannot be obtained 
exactly, but with some unavoidable overcoverage. 

Let me emphasize that the problem of obtaining too stringent upper limits for n < b is very serious 
for a scientist that wants to obtain reliable information from experiment and use this information for other 
purposes (as input for a theory or another experiment). In the past, researchers bearing the same physical 
point of view refrained to report empty confidence intervals or very stringent upper limits when n < b 

1 Let me emphasize that I discuss this case only for the sake of curiosity. It is pretty obvious that a low value of a is devoid 
of any practical interest. 
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Fig. 4: 90% confidence belts for b = 10 in the Unified Ap- 
proach (/i™^ = 0, solid lines) and in the Brutally Modified 
Unified Approach (BMUA) for fi^ t = 1 (dashed lines) and 
C = 2 (dotted lines). 
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Fig. 5: 90% CL upper limit fi up as a function of the back- 
ground b for n — in the Unified Approach (^bcst = ^' 
solid line) and in the BMUA for = 1 (dashed line) and 
/ig£t =2 (dotted line). 



was measured. These confidence intervals are correct from a statistical point of view, but useless from a 
physical point of view. Furthermore, the same reasoning lead to prefer the Unified Approach to central 
confidence intervals or upper limits, because the non-empty confidence interval obtained when n < b is 
measured is certainly more significant, from a physical point of view, than an empty one, although they 
are statistically equivalent, as shown in Section gj. 



4. A brutal modification of the Unified Approach 

In the Unified Approach //best is positive and equal to zero for n < b. If //best lS forced to be always 
bigger than zero, the n's smaller than b have rank higher than in the Unified Approach. As a consequence, 
the decrease of the upper limit /u up as b increases is weakened. This is illustrated by a "Brutally Modified 
Unified Approach " (BMUA) in which we take 

/^best = Max[/ig^,n - b] , (6) 

where /4£S i s a positive real number. 

In Fig. ^ I plotted the confidence belts for juH = (solid lines), that corresponds to the Unified 
Approach, /U™^ = 1 (dashed lines) and //j^. = 2 (dotted lines), for b = 10. One can see that in the 
BMUA the upper limits of the confidence intervals are considerably higher than in the Unified Approach. 
The behavior of /x up as a function of b for n = is shown in Fig. [5l from which it is clear that the 
decrease of /M up when b increases is much weaker in the BMUA (dashed and dotted lines) than in the 
Unified Approach (solid line) and it is almost absent for /i™^ > 2. 

Let me emphasize that 

1 . The BMUA is a statistically correct Frequentist method and coverage is satisfied. 

2. In the BMUA one obtains upper limits for n < b and central confidence intervals for n > b, as in 



the Unified Approach^]. 

3. The BMUA method is not general (although it can be extended in an obvious way at least to the 
case of a gaussian variable with a physical boundary). 

4. / am not proposing the BMUAl (But those that think that the upper limit for n = should not 
depend on b may consider the possibility of using the BMUA with /i™^ = 2 instead of resorting 
to more complicated methods that may even jeopardize the property of coverage^.) 

As shown in Fig. the right edge of the confidence belt in the BMUA is not very different from 
the one in the Unified Approach. This is due to the fact that adding small values of n with low probability 
to the acceptance intervals has little effect. Moreover, it is clear that the acceptance interval for p, = 
is equal for all Frequentist methods with correct coverage that unify the treatment of upper confidence 
limits and two-sided confidence intervals. 



5. Bayesian Ordering 

An elegant, natural and general way to obtain automatically p^^ t > is given by the Bayesian Ordering 
method [Q], in which //best is replaced by the Bayesian expectation value for fj,, /j,b- 

Choosing a natural flat prior, the Bayesian expectation value for p, in a Poisson process with 
background is given by 



Mn,6)=n + 1- V— V- =n + l-b Y-\ y-\ . (8) 





The obvious inequality J2k=okb k /kl < n'^2 1 L = ob k /kl implies that h~q > 1- Therefore, the reference 
value for \i in the likelihood ratio 

*(n, ft i) = -£^=(^)V- (9) 
P{n\HB,o) \VB + bJ 

that determines the construction of the acceptance intervals as in the Unified Approach, is bigger or 
equal than one. As a consequence, the decrease of the upper confidence limit p up for a given n when the 
expected background b increases is significantly weaker than in the Unified Approach, as illustrated in 
Fig. p. 

Figure ||C shows p up as a function of b in the Bayesian Theory with a flat prior and shortest 
credibility intervals^. One can see that the behavior of /t up obtained with the Bayesian Ordering method 
is intermediate between those in the Unified Approach and in the Bayesian Theory. Although one must 

2 For n < b + Mbcst we have /ibest = Mbest an d the likelihood ratio (^) becomes 

R(n<b + /CS ,n,b)=( . ) " e«-" . (7) 

v Mbcst + °/ 

For n < /Xbest> we have (/i + b)/(/tb 1 cTt +b) < 1 and R(n < b + , b) decreases with increasing n. Let us consider now 
n > b + n^est* f° r which pbest ~ n — b and the likelihood ratio (g^ is given by the expression in Eq. (Q). This expression has 
a maximum for n equal to one of the two integers closest to /j, + b. For /i < /Ubest> this integer is the first one in the considered 
range (n > b + Mbcst)- Therefore, for sufficiently low values of fj,, /i < /4Jest> the likelihood ratio (Q) decreases monotonically 
as n increases. In this case, low values of n have highest ranks and are guaranteed to lie in the confidence belt and the left edge 
of the confidence belt must change its slope for n < /x™^ + b and intercept the /t-axis at a positive value of fj,, as illustrated in 

Fig.g 

3 By the way, I think that coverage is the most important property of the Frequentist theory. If coverage is not satisfied the 
results are statistically useless in the contest of Frequentist theory. 

4 In this case the posterior p.d.f. for /i is 

/ « h k\~ 1 

P( t i\n,b) = (b + tJ ,) n e->* (n\^2jy ) , (10) 
V h=o ' / 




always remember that the statistical meaning of fi up is different in the two Frequentist methods (Unified 
Approach and Bayesian Ordering) and in the Bayesian Theory, for scientists using these upper limits it 
is often irrelevant how they have been obtained. Hence, I think that an approximate agreement between 
Frequentist and Bayesian results is desirable. 

From Eq. (|J) one can see that 

n^>b ==> /Ub^, b) ~ n + 1 — b ~ n , (12) 
n<b, 6»1 => ft B (n,b)~l. (13) 

Therefore, for n ^ 6 the confidence belt obtained with the Bayesian Ordering method is similar to that 
obtained with the Unified Approach. The difference between the two methods show up only for n < b. 
This is illustrated in Figs. ^ and [7], that must be confronted with the corresponding Figures [l] and ^| in 
the Unified Approach. Notice that, as shown in Fig. 0, contrary to the Unified Approach, the Bayesian 
Ordering method gives physically significant (non-zero-width) confidence intervals even for low values 
of the confidence level a. 



6. Answers to some criticisms 

Criticism: Bayesian Ordering is a mixture of Frequentism and Bayesianism. The uncompromising Fre- 
quentist cannot accept it. 

No! It is a Frequentist method. 

Bayesian theory is only used for the choice of ordering in the construction of the acceptance 
intervals, that in any case is subjective and beyond Frequentism (as, for example, the central interval 

and the probability (degree of believe) that the true value of n lies in the range [fj,i , fj,^] is given by 

P(, <= Mn, » - f .- ± <^ - e- ± (± lP t ) (tiY- 

\ k=0 ' fc=0 ' / \fc=0 ' / 

The shortest 100a% credibility intervals [/xi ow ,/x U p] are obtained by choosing jt*i ow and ^i up such that P(/i 6 
[Miow,Mup]| n > b) = a and P(fi low \n, b) = P(fi up \n, b) if possible (with /i low > 0), or pi ow = 0. 
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Fig. 8: 90% CL upper limit /i up as a function of the background b for n = (solid lines), n — 5 (dashed lines) and n = 10 
(dotted lines). 



prescription or the Unified Approach method). The Bayesian method for such a subjective choice is 
quite natural. 

If you belong to the Frequentist Orthodoxy (sort of religion!) and the word "Bayesian" gives you 
the creeps, you can change the name "Bayesian Ordering" into whatever you like and use its prescription 
for the construction of the acceptance intervals as a successful recipe. 

Criticism: In the Unified Approach (and maybe Bayesian Ordering?) the upper limit on p goes to zero 
for every n as b goes to infinity, so that a low fluctuation of the background entitles to claim a very 
stringent limit on the signal. 

This is not true! 

One can see itf] doing a calculation of the upper limit for fiasa function of b for large values of b. 
The result of such a calculation in the Unified Approach is shown in Fig. ||A, where the 90% CL upper 
limit ^ up is plotted as a function of b in the interval < b < 200 for n = (solid line), n = 5 (dashed 
line) and n = 10 (dotted line). One can see that initially p up decreases with increasing b, but it stabilizes 
to about 0.8 for b ^> n, with fluctuations due to the discreteness of n. Figure ||B shows the same plot 
obtained with the Bayesian Ordering. One can see that initially /i up decreases with increasing b, but less 
steeply than in the Unified Approach, and it stabilizes to about 1.8. For comparison, in Fig. |8|C I plotted 
p up as a function of b in the Bayesian Theory with a flat prior and shortest credibility intervals. One can 
see that the behavior of /i up in the three methods considered in Fig. |8] is rather similar. 

Criticism: For n = the upper limit pL np should be independent of the background b. 

But for n > the upper limit [i up always decreases with increasing b\ It is true that for n = 
one is sure that no background event as well as no signal has been observed. But this is just the effect 
of a low fluctuation of the background that is presentl Should we built a special theory for n = 0? I 
think that this is not interesting in the Frequentist framework, because I guess that it leads necessarily to 
a violation of coverage (that could be tolerated, but not welcomed, only if it is overcoverage). 

I think that if one is so interested in having an upper limit fi up independent of the background b 
for n = 0, one better embrace the Bayesian theory (see Fig. ||C, Fig. ||C and Ref. fll4|]), which, by the 

5 In the Unified Approach the likelihood ratio for n < b is given by the expression in Eq. that tends to e~ M for 6 > n 
and small ji. For /i < 1, e _Al ~ 1 and all n <C b have rank close to maximum. For n > b the likelihood ratio is given by the 
expression in Eq. (^). For large values of b, taking into account that n > b, we have 1 + ln(/i + 6)— In n ~ In 6 — In n < 

and n + b ~ b, which imply that R(n > b, fx, b) < eT h 0. So the rank drops rapidly for n > b. Therefore, for small 
values of /i the n's much smaller than b have highest rank. Since they have also very small probability, they all lie comfortably 
in the confidence belt, if the confidence level a is sufficiently large (a > 0.60). 



way, may present many other attractive qualities (see, for example, [[[]]). 

Criticism: A (worse) experiment with larger background b should not give a smaller upper limit fi U pfor 
the same number n of observed events. 

But, as shown in Fig. ^[ this always happens! Notice that it happens both for n > b (dotted part of 
lines) and for n < b (solid part of lines), in Frequentist methods as well as in the Bayesian Theory (for 
n > 0). As far as I know, nobody questions the decrease of // up as b is increased if n > b. So why should 
we question the same behavior when n < 6? The reason for this behavior is simple: the observation of a 
given number n of observed events has the same probability if the background is small and the signal is 
large or the background is large and the signal is small. 

I think that it is physically desirable that and experiment with a larger background do not give a 
much smaller upper limit for the same number of observed events, but a smaller upper limit is allowed 
by statistical fluctuation. Indeed, 

upper limits (as confidence intervals, etc.) are statistical quantities that must fluctuate! 

I think that the current race of experiments to find the most stringent upper limit is bac^], because it 
induces people to think that limits are fixed and certain. Instead, everybody should understand that 

a better experiment can sometimes give a worse upper limit because of statistical fluctuations 
and there is nothing wrong about it! 

7. Conclusions 

In this report I have shown that the necessity to choose a specific Frequentist method, among several 
available, does not introduce any degree of subjectivity from a statistical point of view (Section^]) 
In other words, all Frequentist methods are statistically equivalent. 

However, the physical significance of confidence intervals obtained with different methods is dif- 
ferent and scientists interested in obtaining reliable and useful information on the characteristics of the 
real world must worry about this problem. Obtaining empty or very small confidence intervals for a 
physical quantity as a result of a statistical procedure is useless. Sometimes it is even dangerous to 
present such results, that lead non-experts in statistics (and sometimes experts too) to false believes. 

In Section [T]l have discussed some virtues and shortcomings of the Unified Approach [Q]. These 
shortcomings are ameliorated in the Bayesian Ordering method [Bj, discussed in Section Bl that is natu- 
ral, relatively easy, and leads to more reliable upper limits. 

In conclusion, I would like to emphasize the following considerations: 

• One must always remember that, in order to have coverage, the choice of a specific Frequentist 
method must be done independently of the knowledge of the data. 

• Finding some examples in which a method fails does not imply that it should not be adopted in the 
cases in which it performs well. 

• Since all Frequentist methods are statistically equivalent, 

there is no need of a general Frequentist method! 
In each case one can choose the method that works better (basing the judgment on easiness, mean- 
ingfulness of limits, etc.). Complicated methods with a wider range of applicability are theoreti- 
cally interesting, but not attractive in practice. 

6 It is surprising that even at the Panel Discussion [|lj] of this Workshop (full of experts) the statement "the experimenters 
like to quote the smallest bound they can get away with" was not strongly criticized. What is the purpose of experiments? (A) 
Give the smallest bound. (B) Give useful and reliable information. If your answer is (A) and you are an experimentalist, I 
suggest that you stop deceiving us and move to some more rewarding cheating activity. 



Somebody thinks that the physics community should agree on a standard statistical method (see, 
for example, [|l^])[|. In that case, it is clear that this method must be always applicable. But this 



is not the case, for example, of the Unified Approach, as shown in [16]. Although the Bayesian 
Ordering method has not been submitted to a similar thorough examination, I doubt that it is 
generally applicable. 

I do not see why experiments that explore different physics and use different experimental tech- 
niques should all use the same statistical method (except a possible ignorance of statistics and 
blind believe to "authorities"). 
I would recommend that 

instead of wasting time on useless characteristics as generality, the physics community 
should worry about the usefulness and credibility of experimental results. 
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